MG205: Econometrics Theory and Applications

Topic 8: Exploiting Time Variation

José Ignacio González Rojas

London School of Economics and Political Science

February 9, 2026

Cross-Sectional Data Cannot Separate Heterogeneity from Treatment Effects

Panel Data Gives Us New Tools to Address Endogeneity

The problem

  • Endogeneity: \(\text{Cov}(x_{it}, e_{it}) \neq 0\)
  • Violation of Assumption 5 \(\Rightarrow\) no identification
  • OLS gives biased estimates of the parameters of interest
  • Cross-sectional data alone cannot fix this

Today

  • Assume a particular error structure: \(e_{it} = \alpha_i + u_{it}\)
  • With panel data, construct estimators invariant to \(\alpha_i\)

Following the same units over time enables new identification and estimation strategies.

Two Estimators Remove Unit-Level Unobserved Heterogeneity

First Differences and LSDV

First Differences (FD)

  • Population model: \(y_{it} = \beta x_{it} + \alpha_i + u_{it}\)
  • Subtract consecutive observations:

\[\Delta y_{it} = \beta \Delta x_{it} + \Delta u_{it}\]

  • \(\alpha_i - \alpha_i = 0\): unobserved heterogeneity disappears

Least Squares Dummy Variables (LSDV)

  • Include a dummy for each unit \(i\):

\[y_{it} = \beta x_{it} + \sum_{j=2}^{N} \gamma_j \mathbb{1}[i=j] + u_{it}\]

  • The dummies absorb \(\alpha_i\)
  • Equivalent to FD for \(T=2\) (we prove this later)

Exercise 1: Unobserved Heterogeneity Biases Cross-Sectional Estimates

Airline Fares Depend on Unobserved Route and Time Characteristics

Two Sources of Omitted Variable Bias

\[ \begin{aligned} \log(\text{fare})_{it} &= \beta_0 + \beta_1\log(\text{distance})_i + \beta_2\text{competition}_{it} + e_{it} \\ e_{it} &= \gamma_i + \delta_t + u_{it} \end{aligned} \]

  • \(\gamma_i\): route-specific, time-invariant unobserved heterogeneity
    • Business relationships
    • Airport amenities
  • \(\delta_t\): common time shocks
    • Fuel prices
    • Economic conditions
  • \(u_{it}\) is idiosyncratic error
  • We worry that \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\)
    • Model not identified
    • \(\hat{\beta}\) might be biased

Derivation

Controlling for Distance and Year Dummies Does Not Remove Route Heterogeneity

The Proposed Model Falls Short

Estimated model

\[\begin{align*} \widehat{\log(\text{fare})}_{it} &= \hat{\beta}_{0} + \hat{\beta}_{1}\log(\text{distance})_{i} \\ &+ \hat{\beta}_{2}\text{competition}_{it} \\ &+ \hat{\delta}_{1}\mathbb{1}[t=2007] \\ &+ \hat{\delta}_{2}\mathbb{1}[t=2012] \end{align*}\]

What remains in the error?

  • Recall: \(e_{it} = \gamma_i + \delta_t + u_{it}\)
  • The year dummies address common time trends (\(\delta_t\))
  • \(\gamma_i\) remains in the error

Since \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\), OLS is biased.

First-Differencing Eliminates Route-Level Unobserved Heterogeneity

The First-Difference Estimator

  • First-difference operator: \(\Delta x_{it} = x_{it} - x_{it-1}\)
  • Example: Take the USA–UK route. Subtract 2002 from 2007, and 2007 from 2012.

\[ \Delta\log(\text{fare})_{it} = \beta_2\Delta\text{competition}_{it} + \Delta\delta_t + \Delta u_{it} \]

\(\gamma_i - \gamma_i = 0\): time-invariant route characteristics disappear.

With year dummies for transition periods (2002–2007 base, 2007–2012):

\[ \widehat{\Delta\log(\text{fare})}_{it} = \hat\alpha + \hat\beta_2\Delta\text{competition}_{it} + \hat\delta\mathbb{1}[\text{transition } 2007-2012] \]

Derivation

Combining FD with Year Dummies Addresses Both Sources

Two Strategies for Two-Way Heterogeneity

FD removes \(\gamma_i\) (unit FE)

  • Subtract consecutive observations
  • \(\gamma_i - \gamma_i = 0\)
  • Time-invariant variables also drop out: \(\Delta\log(\text{distance})_i = 0\)

Year dummies absorb \(\delta_t\) (time FE)

  • Include dummies for transition periods
  • Common time shocks captured
  • This is LSDV applied to time effects

(1) FD + year dummies, or (2) full LSDV with dummies for both units and time periods.

Rejecting the Null Does Not Validate the Model

The Trap

  • With robust \(t\)-statistics, we reject \(H_{0}: \beta_{\text{competition}} = 0\)
  • But the null assumes the model is correctly specified
  • If OVB remains (route-level heterogeneity not addressed), \(\hat{\beta}\) is biased
  • Statistical significance \(\neq\) valid causal interpretation
  • The estimate is a linear projection
  • FD reduces bias from time-invariant confounders but does not eliminate all sources

Exercise 2: Time Effects Capture Industry-Wide Patent Growth

Industry-Wide Patent Growth Requires Flexible Time Effects

Setting

  • 37 pharmaceutical firms
  • 2005-2007
  • No OVB concerns
    • Causal interpretation
  • Patents growing industry-wide
    • Regardless of individual firm R&D

Model

\[\begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ &+ \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ e_{it} \end{align*}\]

  • Could model the trend linearly or quadratically
  • Year dummies allow any form — nonparametric
  • \(\beta_{1}\): elasticity of patents w.r.t. R&D (causal)

Year Dummy Coefficients Measure Growth Rates

Conditional Expectations Are The Tool to Interpret

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + \beta_1\log(\text{R\&D})_{it} \end{align*}\]

  • Average log change in patents across all firms
  • Geometric mean growth rate of patents in the industry, holding R&D constant
    • \(\beta_2\): 2005 to 2006
    • \(\beta_3\): 2005 to 2007

Year dummies measure growth rates between periods — not “the level in 2006 vs 2005.”

Interactions Allow the R&D Elasticity to Vary Over Time

Heterogeneous Elasticities

\[ \begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} + \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ \beta_4(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2006]) + \beta_5(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2007]) \\ &+ e_{it} \end{align*} \]

Conditional means by year

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + (\beta_1 + \beta_4)\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + (\beta_1 + \beta_5)\log(\text{R\&D})_{it} \end{align*}\]

Level Shifts and Slope Shifts Are Separately Identified

Decomposing Differential Effects

Year Intercept Elasticity
2005 \(\beta_0\) \(\beta_1\)
2006 \(\beta_0 + \beta_2\) \(\beta_1 + \beta_4\)
2007 \(\beta_0 + \beta_3\) \(\beta_1 + \beta_5\)

How the patents-R&D elasticity changes over time

  • \(\beta_4\): elasticity change 2005 \(\to\) 2006
  • \(\beta_5\): elasticity change 2005 \(\to\) 2007

No need to interpret \(\beta_4\) and \(\beta_5\) individually; the conditional means do the work.

Exercise 3: Empirical Models Interact All Relevant Variables

Gender Wage Gaps Changed After the Mining Boom

Three Patterns from the Data

  • Gender: Men earn a constant wage premium over women
  • Time trend: Wages grow over time for both groups
  • Structural break (2005): After the mining company arrives, the male premium widens

The Empirical Model Interacts Gender, Time, and Post-2005

Eight Coefficients for Four Groups

\[ \begin{align*} \log(\text{wages})_{it} &= \beta_0 + \beta_1\mathbb{1}[i\text{ is male}] + \beta_2 t + \beta_3\mathbb{1}[t \geq 2005] \\ &+ \beta_4(\mathbb{1}[i\text{ is male}] \times t) + \beta_5(\mathbb{1}[i\text{ is male}] \times \mathbb{1}[t \geq 2005]) \\ &+ \beta_6(t \times \mathbb{1}[t \geq 2005]) + \beta_7(t \times \mathbb{1}[t \geq 2005] \times \mathbb{1}[i\text{ is male}]) \\ &+ e_{it} \end{align*} \]

This model captures level differences, trends, and how both changed after 2005, separately for men and women.

Conditional Means: Pre-2005

Women and Men Before the Structural Break

Women before 2005 (base category)

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,t<2005] = \beta_0 + \beta_2 t \]

Men before 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,t<2005] = (\beta_0 + \beta_1) + (\beta_2 + \beta_4)t \]

\(\beta_1\) shifts the intercept; \(\beta_4\) shifts the slope.

Conditional Means: Post-2005

Women and Men After the Structural Break

Women after 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,\; t \geq 2005] = (\beta_0 + \beta_3) + (\beta_2 + \beta_6)t\]

Men after 2005

\[\begin{align*} \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,\; t \geq 2005] &= (\beta_0 + \beta_1 + \beta_3 + \beta_5) \\ &\quad + (\beta_2 + \beta_4 + \beta_6 + \beta_7)t \end{align*}\]

Each coefficient modifies either the intercept or slope for a specific group-period combination.

Taking Differences Isolates Each Coefficient’s Role

Condition on Group, Then Difference

Coefficient Signs Follow Directly from the Figure

Economic Interpretation

Positive (\(> 0\))

  • \(\beta_1\): male premium
  • \(\beta_2\): wages grow over time
  • \(\beta_7\): male wages grow faster post-2005

Zero (\(= 0\))

  • \(\beta_3\): no level break for women at 2005
  • \(\beta_4\): same pre-2005 growth rate
  • \(\beta_6\): female growth unchanged post-2005

Negative (\(< 0\))

  • \(\beta_5\): relative to the base group (women pre-2005), the intercept for men post-2005 is lower than what other coefficients predict

Exercise 4: Panel Data Enables Identification and Estimation

Panel Data Follows the Same Units Over Time

Definition and Structure

  • \(y_{it}\), \(x_{it}\) for \(i = 1, \ldots, N\) and \(t = 1, \ldots, T\)
  • Panel data: same units observed across multiple time periods
  • Cross-section: one \(t\) only
  • Repeated cross-section: same population, different individuals each period

Error decomposition

\[ e_{it} = \alpha_i + v_{it} \]

Panel Structure Enables Identification of Parameters

Identification vs Estimation

Identification

  • Could we recover unique values for each parameter?
    • Cross-section: \(\alpha_i\) in error, correlated with \(x_{it}\) \(\to\) cannot identify \(\beta\)
    • Panel: difference out \(\alpha_i\)
  • Requires: \(\mathbb{E}[v_{it} \mid x_{it}] = 0\)

Estimation

  • Given identification, how do we compute \(\hat\beta\) from the data?

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

  • Requires \(T \geq 2\) and within-unit variation

Example: cannot estimate returns to education via FD if education does not change over time.

Panel Data Reduces OVB but Cannot Eliminate All Sources of Bias

Solves

  • Time-invariant OVB (\(\alpha_i\))
  • Unit-invariant OVB (\(\lambda_t\))

Does not solve

  • Time-varying confounders (\(u_{it}\))
  • Measurement error
  • Selection bias

Less scope for OVB, but not zero.

Exercise 5: First-Differencing Amplifies Measurement Error

Education Is Measured with Error in Both Periods

True vs Observed Variables

\[ \text{education}_{it} = \text{education}^{*}_{it} + e_{it} \]

Assumptions

  • \(e_{it}\) uncorrelated with true education and other variables
  • Education varies little over time for adults

Cross-Sectional Attenuation Bias Shrinks the Coefficient Towards Zero

The Baseline Problem

Population model

\[ \log(\text{wage})_i = \alpha + \beta\text{education}^{*}_i + \epsilon_i \]

Observed model substitutes \(\text{education}_i = \text{education}^{*}_i + e_i\)

Attenuation bias

\[ \text{plim}\;\hat\beta = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

The ratio is less than 1, so the coefficient is biased towards zero (derived in Topic 6).

First-Differencing Increases the Measurement Error Variance

The Panel Data Paradox

FD of observed education

\[ \Delta\text{education}_i = \Delta\text{education}^{*}_{i} + (e_{i2} - e_{i1}) \]

If \(e_{i1}\) and \(e_{i2}\) are uncorrelated

\[ \text{Var}(e_{i2} - e_{i1}) = \text{Var}(e_{i1}) + \text{Var}(e_{i2}) \]

Panel Data Involves a Fundamental Bias Trade-off

FD Eliminates Fixed Confounders but Amplifies Measurement Error

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Benefits

  • Eliminates \(\alpha_i\), reduces OVB from time-invariant confounders
  • Enables causal identification under strict exogeneity

Derivation

Costs

  • Measurement error variance grows in denominator
  • Numerator small if \(x\) changes little over time
  • Attenuation ratio is smaller — more severe bias towards zero

Exercise 6: Panel Data Cannot Solve Selection Bias

Roommate Nationality May Affect Student Grades

Self-Selection into Rooms Creates Endogeneity

Let \(\text{same}_{it} = \mathbb{1}[i\text{ has same-nationality roommate in } t]\)

\[ \text{grades}_{it} = \alpha + \beta\;\text{same}_{it} + e_{it} \]

  • Exogeneity holds under random assignment of roommates
  • If students choose roommates: an omitted equation determines room selection
  • More outgoing students may prefer different nationalities and perform differently academically
  • \(\text{Cov}(\text{same}_{it}, e_{it}) \neq 0\) arises from selection, not unobserved heterogeneity

Two Years of Data Require Roommate Changes

Panel Structure and Exogenous Mobility

Year 1

\(\text{grades}_{i1} = \alpha + \beta\;\text{same}_{i1} + \alpha_i + u_{i1}\)

Year 2

\(\text{grades}_{i2} = \alpha + \beta\;\text{same}_{i2} + \alpha_i + u_{i2}\)

First-differencing

\(\Delta\text{grades}_i = \beta\;\Delta\text{same}_i + \Delta u_i\)

  • Critical: students must change roommates (\(\Delta\text{same}_i \neq 0\) for some)
  • Exogenous mobility design — plausible if the university reassigns rooms

The Problem Is Selection, Not Unobserved Heterogeneity

What Panel Data Cannot Fix

  • FD removes \(\alpha_i\) (unobserved ability) — but the core problem is selection into rooms
  • If reasons for changing roommates correlate with grade changes, FD does not help
  • Panel data addresses unobserved heterogeneity
  • It does not address selection bias

Random Assignment Plus Panel Data Strengthens Identification

You Get What You Pay For

  • Random assignment of roommates:
    • \(\beta\) is identified even in cross-section
    • Panel adds precision: removes \(\alpha_i\) from the error \(\to\) smaller variance \(\to\) smaller standard errors
  • Potential selection: panel data alone cannot solve the selection problem

Inference, Functional Forms, and Composition in Panel Data

We Run into an Old Friend!

Inference

  • AS2 violated: same unit observed repeatedly
  • AS7 may fail: heterogeneous units
  • Affects SEs, not \(\hat{\beta}\)

Functional Forms

  • Linear vs nonparametric time trends
  • Treatment absorbed by time FE
  • Unbalanced panels → composition effects

Exercise 7: Clustered Standard Errors Account for Within-Unit Correlation

Clustering Allows Arbitrary Within-Unit Correlation

A Conservative Fix for Panel Inference

What clustering does

  • Allows arbitrary correlation within cluster
  • Requires independence across clusters
  • Does not change \(\hat{\beta}\) — only the standard errors

Two sources of correlation

  • Within-unit persistence: worker earnings persist year to year
  • Within-group spillovers: departmental training affects all workers

Serial Correlation Only Affects Inference

The Three Pillars: Identification, Estimation, Inference

  • Identification (AS1, AS3-AS5): Can we recover \(\beta\)? Not affected
  • Estimation (AS2): How do we compute \(\hat{\beta}\)? Not affected
  • Inference (AS2, AS7): Are our SEs, p-values, CIs valid?
    • Assumption 2 (i.i.d. sampling): same unit observed repeatedly → errors correlated within unit
    • Assumption 7 (homoskedasticity): error variance may differ across units
    • Both violations affect inference only\(\hat{\beta}\) unchanged, SEs wrong

Exercise 8: Time Fixed Effects Can Be Collinear with Treatment

Measuring the Effect of Increased Force

All Municipalities Treated Simultaneously

Empirical setup

  • Dependent variable: Drug usage at municipality \(i\) on day \(t\)
  • Treatment: All municipalities increase police on the same date (vertical line in figure)
  • Drug usage has a pre-existing upward trend

Day Fixed Effects Cannot Separate Treatment from Time

The Collinearity Trap

Ideal model: \(\text{drug usage}_{it} = \mu\;\text{post}_t + \theta_i + \rho_t + e_{it}\)

Conditional expectations — with \(T = 4\), treatment at \(t = 3\):

\[\begin{align*} \mathbb{E}[\text{drug usage}_{it} \mid t=1] &= \theta_i + \rho_1 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=2] &= \theta_i + \rho_2 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=3] &= \mu + \theta_i + \rho_3 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=4] &= \mu + \theta_i + \rho_4 \end{align*}\]

  • \((\mu + \rho_3)\) observed jointly, not separately \(\Rightarrow\) \(\mu\) not identified
  • Reason: \(\text{post}_t\) is a linear combination of day dummies

A Linear Time Trend Restores Identification

Parametric but Estimable

\[\text{drug usage}_{it} = \mu\text{post}_t + \theta_i + \gamma t + e_{it}\]

Interpretation of \(\gamma\)

  • Average change per time unit
  • Imposes linearity — may miss curvature
  • Intermediate: polynomial

The trade-off

Parametric trend Day FE
Flexibility Low (linear) High (any shape)
Estimate \(\mu\)? Yes No (collinear)
Risk Misspecified trend No identification

When treatment varies only at the time level, time FE absorb it completely.

Proof

Exercise 9: Age Dummies Provide Nonparametric Functional Forms

When We Do Not Know the Functional Form, Use Dummies

Nonparametric Estimation

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

  • \(\alpha_i\): individual FE (absorbs ability, education, etc.)
  • \(\theta_t\): time FE (absorbs aggregate macroeconomic trends)
  • \(\gamma_j\): average difference in log-earnings between workers aged \(j\) and workers aged 16, holding constant \(\alpha_i\) and \(\theta_t\)
    • Nonparametric: no assumption on the shape of the age-earnings relation
    • Parametric: quadratic requires \(f(\text{age}) = \beta_1\;\text{age} + \beta_2\;\text{age}^2\)

Why log? APC problem

Imprecision at Extremes Reflects Thin Data

The Variance Formula for Dummy Variables

Each age dummy \(d_j = \mathbb{1}[\text{age}_{it} = j]\) is a binary variable with proportion \(p_j = n_j/n\):

\[ \text{Var}(\hat{\gamma}_j) \propto \frac{\sigma^2}{n \cdot p_j(1 - p_j)} \]

  • Most workers aged 25-64 \(\Rightarrow\) \(p_j\) near its maximum \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) small
  • Few workers at ages 16-24 and 65-85 \(\Rightarrow\) \(p_j \approx 0\) \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) large
  • \(\hat{\gamma}_j\) at extreme ages has wide confidence intervals

Nonparametric flexibility comes at the cost of imprecision where data is thin.

Exercise 10: Fixed Effects Decompose Treatment Into Incentive and Selection

Performance Pay Increases Firm-Level Productivity by 20%

But How Much Is Incentives vs Selection?

A firm introduces performance pay. The panel is unbalanced: some workers leave (exiters), some stay (stayers), some join (entrants).

\[ \log(\widehat{\text{productivity}})_{it} = \hat{\alpha}_i + \hat{\beta}\;\text{performance pay}_t \]

OLS (Pooled)

  • \(\hat{\beta}_{\text{OLS}} = 0.20\) (SE \(= 0.03\))
  • Captures total effect at firm level
    • Uses all workers (stayers + exiters + entrants)

Fixed Effects

  • \(\hat{\beta}_{\text{FE}} = 0.10\) (SE \(= 0.02\))
  • Captures within-worker incentive effect only
    • Only stayers contribute to identification

Half the Effect Is Incentives, Half Is Composition

The Decomposition

Incentive effect

  • \(\hat{\beta}_{\text{FE}} = 0.10\)
  • Same workers produce more under performance pay
    • They work harder

Composition effect

  • \(\hat{\beta}_{\text{OLS}} - \hat{\beta}_{\text{FE}} = 0.20 - 0.10 = 0.10\)
  • Different workers join the firm under performance pay
    • \(\mathbb{E}[\alpha_i \mid \text{entrant}] > \mathbb{E}[\alpha_i \mid \text{exiter}]\)

OLS captures total change; FE isolates the within-unit mechanism. The difference is the selection channel.

Formal decomposition

Exercise 11: Worker Composition Changes Require Fixed Effects

This Exercise Combines Three Previous Challenges

Collinearity (Q8) + Composition (Q10) + Time-Varying Controls (New)

\[ \text{productivity}_{it} = \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} + e_{it} \]

Q8 callback: Collinearity with time FE

  • Contingent pay switches on one date for everyone
  • \(\text{contingent}_t\) collinear with day dummies
  • Cannot include time FE → use weather as observable time-varying control

Q10 callback: Negative composition

  • Best workers leave for blueberry harvest (neighbouring farms offer better alternatives)
  • Remaining workers less productive: \[\mathbb{E}[\gamma_i \mid \text{second half}] < \mathbb{E}[\gamma_i \mid \text{first half}]\]
  • Direction flips from Q10 (where better workers joined)

Width and Height Are Time-AND-Individual Varying

Controls vs Fixed Effects Address Different Problems

\[\begin{align*} \text{productivity}_{it} &= \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} \\ &+ \gamma_i + e_{it} \end{align*}\]

  • Width and height vary across workers AND days — not captured by worker FE (\(\gamma_i\) absorbs time-invariant traits only)
  • Still susceptible to Assumption 5: if correlated with \(e_{it}\), OVB persists
  • Only workers in both periods contribute — churn shrinks effective sample

Controls handle time-varying confounders; FE handles time-invariant heterogeneity.

Exercise 12: Multiple Fixed Effects Identify Leader Quality Through Rotation

Call Centre Productivity Varies Across Operators, Leaders, and Days

Three Sources of Variation

Setting

  • Outcome: calls answered per hour
  • Panel of operators \(\times\) days
  • Random call assignment: no operator-call selection
  • Operators and leaders rotate across teams
    • Rotation provides identifying variation

Three sources of heterogeneity

  • \(\lambda_i\): operator ability
  • \(\mu_j\): leader quality (the parameter of interest)
  • \(\theta_t\): day-level conditions (holidays, system outages)

Three-Way Fixed Effects Isolate Leader Quality

Rotation Is the Identification Condition

\[ \text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt} \]

  • \(\lambda_i\): operator FE — absorbs intrinsic worker ability
  • \(\mu_j\): leader FE — the object of interest
  • \(\theta_t\): day FE — absorbs common daily shocks
  • Identification requires rotation: same operator observed under different leaders
    • Without rotation, \(\lambda_i\) and \(\mu_j\) are not separately identified

The F-Test Provides Evidence for Leader Quality Differences

Joint Significance of Leader Fixed Effects

Restricted vs unrestricted

  • Restricted (\(H_0\) true): \(\text{productivity}_{ijt} = \lambda_i + \mu + \theta_t + e_{ijt}\)
  • Unrestricted (\(H_1\)): \(\text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt}\)

Five-step framework

  1. Choose \(\alpha = 0.05\)
  2. \(H_0: \mu_2 = \mu_3 = \cdots = \mu_J = 0\)
  3. \(F = \frac{(\text{RSS}_R - \text{RSS}_U)/q}{\text{RSS}_U/(n - K)}\;,\quad q = J - 1\)
  4. Reject \(H_0\) if \(F > F_{q,\; n-K,\; \alpha}\)
  5. If reject: leaders differ in quality

Multiple fixed effects require sufficient rotation across dimensions for identification.

AKM (1999)

Panel Data Provides Identification Through Within-Unit Variation

Summary (I)

  1. FD and LSDV address different components of unobserved heterogeneity
  2. Time effects capture common growth; interactions capture heterogeneous effects
  3. Measurement error is amplified by first-differencing — a fundamental trade-off
  4. Panel data cannot solve selection bias
  5. Clustered standard errors correct for within-unit serial correlation

Panel Data Provides Identification Through Within-Unit Variation

Summary (II)

  1. Time fixed effects can be collinear with treatment
  2. Dummies provide nonparametric functional forms at the cost of imprecision at extremes
  3. Fixed effects decompose total effects into within-unit incentive and composition channels
  4. Multiple fixed effects require rotation for identification; F-tests assess joint significance

Next Week: Discrimination

Empirical Exercise 6

Bertrand & Mullainathan (2004)

  • Are Emily and Greg more employable than Lakisha and Jamal?
  • Resume audit study
    • Randomly assigned racial signals
  • Experimental identification
    • Randomisation breaks OVB
  • Asymmetric returns to resume quality by race

Appendix: Derivations and Extensions

First-Difference Derivation for General Panel Model

Algebra

Write the model for \(t = 1\) and \(t = 2\):

\[\begin{align*} y_{i1} &= \beta_0 + \beta_1 x_{i1,1} + \cdots + \beta_k x_{i1,k} + a_i + v_{i1} \\ y_{i2} &= (\beta_0 + \delta) + \beta_1 x_{i2,1} + \cdots + \beta_k x_{i2,k} + a_i + v_{i2} \end{align*}\]

Subtract: \(a_i - a_i = 0\).

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \beta_2\Delta x_{i2} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

First-Differencing Removes Time-Invariant Unobservables

  • \(a_i\) eliminated by differencing: \(a_i - a_i = 0\)
  • OLS on \(\Delta y_i\) is consistent under strict exogeneity: \(\mathbb{E}[\Delta v_i \mid \Delta x] = 0\)
  • Time-invariant variables (e.g., distance, gender) also drop out
  • Generalises to \(T > 2\): take consecutive differences for each pair \((t, t-1)\)

Return to Q1

Attenuation Factor: Cross-Section vs First-Differences

Cross-section

\[ \text{plim}\hat\beta_{\text{CS}} = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

First-differences (assuming uncorrelated measurement errors)

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Why FD amplifies the bias

  • Numerator shrinks: education barely changes → \(\text{Var}(\Delta\text{educ}^*) \ll \text{Var}(\text{educ}^*)\)
  • Denominator doubles: \(\text{Var}(e_{i1}) + \text{Var}(e_{i2}) = 2\text{Var}(e)\)

Return to Q5

Why Log?

Three Reasons for Using Log-Earnings

  1. Skewness: raw earnings are right-skewed
    • \(\log(\text{earnings})\) yields a more symmetric distribution, closer to normality (supporting AS7)
  2. Multiplicative relationships: if earnings = base \(\times\) skill premium \(\times\) experience premium, then
    • \(\log(\text{earnings}) = \log(\text{base}) + \log(\text{skill premium}) + \log(\text{experience premium})\)
    • Log linearises multiplicative structures into additive ones
  3. Growth rates: \(\Delta\log(\text{earnings}) \approx \%\Delta\text{earnings}\)
    • Differences in logs approximate percentage changes
    • The natural unit for comparing workers across different baseline earnings

Return to Q9

Age-Period-Cohort Identification Problem

Why Age, Time, and Individual FE Cannot All Be Included

The collinearity

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

suffers from a fundamental identification problem:

\[ \text{age}_{it} = \text{year}_t - \text{birth year}_i \]

  • \(\text{birth year}_i\) is a linear function of \(\alpha_i\) (time-invariant)
  • \(\text{year}_t\) is captured by \(\theta_t\)
  • Therefore age is perfectly collinear with \(\alpha_i\) and \(\theta_t\)

Why this matters

  • Cannot separately identify age, period, cohort without restrictions
  • Normalise: set one age effect to zero
  • See Deaton (2018) and McKenzie (2006)

Return to Q9

OLS vs FE Decomposition — Formal

Stayers (\(S\)), Exiters (\(X\)), Entrants (\(N\))

Algebra

\[\begin{align*} \hat{\beta}_{\text{OLS}} &= \bar{y}_{\text{post}} - \bar{y}_{\text{pre}} \\ &= \left[\frac{|S|}{|S|+|N|}\bar{y}^S_{\text{post}} + \frac{|N|}{|S|+|N|}\bar{y}^N_{\text{post}}\right] \\ &\quad - \left[\frac{|S|}{|S|+|X|}\bar{y}^S_{\text{pre}} + \frac{|X|}{|S|+|X|}\bar{y}^X_{\text{pre}}\right] \end{align*}\]

\[ \hat{\beta}_{\text{FE}} = \frac{1}{|S|}\sum_{i \in S}(y_{i,\text{post}} - y_{i,\text{pre}}) \]

Interpretation

\(|S|\), \(|X|\), \(|N|\) = number of stayers, exiters, entrants.

  • \(\hat{\beta}_{\text{OLS}}\): all workers — total effect
  • \(\hat{\beta}_{\text{FE}}\): stayers only — incentive effect
  • Not bias — different quantities

Return to Q10

AKM: Matched Employer-Employee Framework

Abowd, Kramarz, and Margolis (1999, Econometrica)

The call centre exercise (Q12) builds on Abowd et al. (1999) and Fenizia (2022).

\[ \log(\text{wages})_{it} = \alpha_i + \psi_{J(i,t)} + x'_{it}\beta + e_{it} \]

  • \(\alpha_i\): worker FE (ability, human capital)
  • \(\psi_{J(i,t)}\): firm FE for the firm \(J\) employing worker \(i\) at time \(t\)
  • Identification requires worker mobility across firms (exogenous mobility)
    • Same logic as Q12: operators rotate across teams
    • Without mobility, \(\alpha_i\) and \(\psi_{J(i,t)}\) not separately identified

Return to Q12

Why Day FE and Post Are Collinear

A Concrete Example with Four Days

Treatment starts on day 3:

Day \(d_2\) \(d_3\) \(d_4\) \(\text{post}_t\)
1 0 0 0 0
2 1 0 0 0
3 0 1 0 1
4 0 0 1 1

\(\text{post}_t = d_3 + d_4\) — an exact linear combination of the day dummies. Stata would drop one variable automatically. The treatment effect \(\mu\) cannot be separated from the day effects.

Replacing Day FE with a Parametric Trend Restores Identification

  • Resolution: replace \(T-1\) day dummies with a single parametric trend (\(\gamma\;\text{time}_t\))
    • Fewer parameters → no collinearity → \(\mu\) is identified
  • Q8 application: all municipalities receive increased police force simultaneously — \(\text{post}_t\) is perfectly collinear with day FE
  • Q11 application: contingent pay switches on a single date for all workers — same collinearity
  • Trade-off: a linear trend is restrictive (assumes constant time effect) but estimable; day FE are flexible but break identification when treatment is universal

Return to Q8

References

References

Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333. https://doi.org/10.1111/1468-0262.00020
Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4), 991–1013. https://doi.org/10.1257/0002828042002561
Deaton, A. (2018). The Analysis of Household Surveys: A Microeconometric Approach to Development Policy (Reissue Edition with a New Preface). World Bank Group. https://doi.org/10.1596/ 978-1-4648-1331-3
Fenizia, A. (2022). Managers and Productivity in the Public Sector. Econometrica, 90(3), 1063–1084. https://doi.org/10.3982/ECTA19244
McKenzie, D. J. (2006). Disentangling Age, Cohort and Time Effects in the Additive Model. Oxford Bulletin of Economics and Statistics, 68(4), 473–495. https://doi.org/10.1111/j.1468-0084.2006.00173.x